Methods and compositions for producing labeled probe nucleic acids for use in array based comparative genomic hybridization applications

ABSTRACT

Methods and compositions for producing labeled probe nucleic acids from genomic nucleic acid template are provided. In the subject methods, a conserved coding consensus region primer is employed to enzymatically generate a select set of labeled probe nucleic acids corresponding to coding regions of genes from a genomic template via a primer extension protocol. The subject methods find use in a variety of different applications, and are particularly suited for use in the preparation of labeled probe nucleic acids for use in array based comparative genomic hybridization applications. Also provided are kits for use in practicing the subject methods.

TECHNICAL FIELD

The technical field of this invention is comparative genomichybridization (CGH)

BACKGROUND OF THE INVENTION

Many genomic and genetic studies are directed to the identification ofdifferences in gene dosage or expression among cell populations for thestudy and detection of disease. For example, many malignancies involvethe gain or loss of DNA sequences resulting in activation of oncogenesor inactivation of tumor suppressor genes. Identification of the geneticevents leading to neoplastic transformation and subsequent progressioncan facilitate efforts to define the biological basis for disease,improve prognostication of therapeutic response, and permit earliertumor detection. In addition, perinatal genetic problems frequentlyresult from loss or gain of chromosome segments such as trisomy 21 orthe micro deletion syndromes. Thus, methods of prenatal detection ofsuch abnormalities can be helpful in early diagnosis of disease.

Comparative genomic hybridization (CGH) is one approach that has beenemployed to detect the presence and identify the location of amplifiedor deleted sequences. CGH reveals increases and decreases irrespectiveof genome rearrangement. In one implementation of CGH, genomic DNA isisolated from normal reference cells, as well as from test cells (e.g.,tumor cells). The two nucleic acids are differentially labeled and thenhybridized in situ to metaphase chromosomes of a reference cell. Therepetitive sequences in both the reference and test DNAs are eitherremoved or their hybridization capacity is reduced by some means.Chromosomal regions in the test cells which are at increased ordecreased copy number can be identified by detecting regions where theratio of signal from the two DNAs is altered. For example, those regionsthat have been decreased in copy number in the test cells will showrelatively lower signal from the test DNA than the reference compared toother regions of the genome. Regions that have been increased in copynumber in the test cells will show relatively higher signal from thetest DNA.

In a recent variation of the above traditional CGH approach, theimmobilized chromosome element has been replaced with a collection ofsolid support bound target nucleic acids, e.g., an array of cDNAs. Suchapproaches offer benefits over immobilized chromosome approaches, butintroduce new problems. For example, only a small percentage of thegenome is represented in the collection of solid support bound targetsand, therefore, only a small percentage of the labeled probe materialactually hybridizes to the immobilized targets, which results in lowsignal intensities for genomic derived probe nucleic acids populations.

Accordingly, there is interest in the development of improved arraybased CGH protocols.

Relevant Literature

United States Patents of interest include: U.S. Pat. Nos. 6,335,167;6,197,501; 5,830,645; and 5,665,549.

SUMMARY OF THE INVENTION

Methods and compositions for producing labeled probe nucleic acids fromgenomic nucleic acid template are provided. In the subject methods, aconserved coding consensus region primer is employed to enzymaticallygenerate a select set of labeled probe nucleic acids corresponding tocoding regions of genes from a genomic template via a primer extensionprotocol. Examples of primers that may be employed in the subjectmethods include a family of primers containing the 5′-NnATGNn-3′sequence or primers recognizing consensus splice sites such as5′-ACTTACCTN-3′ or 5′-NnAGGNn-3′. The subject methods result in a lowercomplexity labeled probe population which enables equivalent specificsignal to non-specific signal intensities at lower stringency, thusresulting in higher signal and improved signal to background. Thesubject methods find use in a variety of different applications, and areparticularly suited for use in the preparation of labeled probe nucleicacids for use in array based comparative genomic hybridizationapplications. Also provided are kits for use in practicing the subjectmethods.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 provides a schematic representation of a method according to thesubject invention.

FIGS. 2A and 2B provide schematic representations of the use ofdifferent types of primers according to an embodiment of the subjectinvention.

DEFINITIONS

The term “nucleic acid” as used herein means a polymer composed ofnucleotides, e.g. deoxyribonucleotides or ribonucleotides, or compoundsproduced synthetically (e.g. PNA as described in U.S. Pat. No. 5,948,902and the references cited therein) which can hybridize with naturallyoccurring nucleic acids in a sequence specific manner analogous to thatof two naturally occurring nucleic acids, e.g., can participate inWatson-Crick base pairing interactions.

The terms “ribonucleic acid” and “RNA” as used herein mean a polymercomposed of ribonucleotides.

The terms “deoxyribonucleic acid” and “DNA” as used herein mean apolymer composed of deoxyribonucleotides.

The term “oligonucleotide” as used herein denotes single strandednucleotide multimers of from about 10 to 100 nucleotides and up to 200nucleotides in length.

The term “polynucleotide” as used herein refers to single or doublestranded polymer composed of nucleotide monomers of generally greaterthan 100 nucleotides in length.

The term “functionalization” as used herein relates to modification of asolid substrate to provide a plurality of functional groups on thesubstrate surface. By a “functionalized surface” as used herein is meanta substrate surface that has been modified so that a plurality offunctional groups are present thereon.

The term “array” encompasses the term “microarray” and refers to anordered array presented for binding to ligands such as polymers,polynucleotides, peptide nucleic acids and the like.

The terms “reactive site”, “reactive functional group” or “reactivegroup” refer to moieties on a monomer, polymer or substrate surface thatmay be used as the starting point in a synthetic organic process. Thisis contrasted to “inert” hydrophilic groups that could also be presenton a substrate surface, e.g., hydrophilic sites associated withpolyethylene glycol, a polyamide or the like.

The term “oligomer” is used herein to indicate a chemical entity thatcontains a plurality of monomers. As used herein, the terms “oligomer”and “polymer” are used interchangeably, as it is generally, although notnecessarily, smaller “polymers” that are prepared using thefunctionalized substrates of the invention, particularly in conjunctionwith combinatorial chemistry techniques. Examples of oligomers andpolymers include polydeoxyribonucleotides (DNA), polyribonucleotides(RNA), other polynucleotides which are C-glycosides of a purine orpyrimidine base, polypeptides (proteins), polysaccharides (starches, orpolysugars), and other chemical entities that contain repeating units oflike chemical structure. In the practice of the instant invention,oligomers will generally comprise about 2-50 monomers, preferably about2-20, more preferably about 3-10 monomers.

The term “ligand” as used herein refers to a moiety that is capable ofcovalently or otherwise chemically binding a compound of interest. Thearrays of solid-supported ligands produced by the methods can be used inscreening or separation processes, or the like, to bind a component ofinterest in a sample. The term “ligand” in the context of the inventionmay or may not be an “oligomer” as defined above. However, the term“ligand” as used herein may also refer to a compound that is“pre-synthesized” or obtained commercially, and then attached to thesubstrate.

The term “sample” as used herein relates to a material or mixture ofmaterials, typically, although not necessarily, in fluid form,containing one or more components of interest.

The terms “nucleoside” and “nucleotide” are intended to include thosemoieties which contain not only the known purine and pyrimidine bases,but also other heterocyclic bases that have been modified. Suchmodifications include methylated purines or pyrimidines, acylatedpurines or pyrimidines, alkylated riboses or other heterocycles. Inaddition, the terms “nucleoside” and “nucleotide” include those moietiesthat contain not only conventional ribose and deoxyribose sugars, butother sugars as well. Modified nucleosides or nucleotides also includemodifications on the sugar moiety, e.g., wherein one or more of thehydroxyl groups are replaced with halogen atoms or aliphatic groups, orare functionalized as ethers, amines, or the like.

An “array,” includes any two-dimensional or substantiallytwo-dimensional (as well as a three-dimensional) arrangement ofaddressable regions bearing a particular chemical moiety or moieties(e.g., biopolymers such as polynucleotide or oligonucleotide sequences(nucleic acids), polypeptides (e.g., proteins), carbohydrates, lipids,etc.) associated with that region. In the broadest sense, the preferredarrays are arrays of polymeric binding agents, where the polymericbinding agents may be any of: polypeptides, proteins, nucleic acids,polysaccharides, synthetic mimetics of such biopolymeric binding agents,etc. In many embodiments of interest, the arrays are arrays of nucleicacids, including oligonucleotides, polynucleotides, cDNAs, mRNAs,synthetic mimetics thereof, and the like. Where the arrays are arrays ofnucleic acids, the nucleic acids may be covalently attached to thearrays at any point along the nucleic acid chain, but are generallyattached at one of their termini (e.g. the 3′ or 5′ terminus).Sometimes, the arrays are arrays of polypeptides, e.g., proteins orfragments thereof.

Any given substrate may carry one, two, four or more or more arraysdisposed on a front surface of the substrate. Depending upon the use,any or all of the arrays may be the same or different from one anotherand each may contain multiple spots or features. A typical array maycontain more than ten, more than one hundred, more than one thousandmore ten thousand features, or even more than one hundred thousandfeatures, in an area of less than 20 cm² or even less than 10 cm². Forexample, features may have widths (that is, diameter, for a round spot)in the range from a 10 μm to 1.0 cm. In other embodiments each featuremay have a width in the range of 1.0 μm to 1.0 mm, usually 5.0 μm to 500μm, and more usually 10 μm to 200 μm. Non-round features may have arearanges equivalent to that of circular features with the foregoing width(diameter) ranges. At least some, or all, of the features are ofdifferent compositions (for example, when any repeats of each featurecomposition are excluded the remaining features may account for at least5%, 10%, or 20% of the total number of features). Interfeature areaswill typically (but not essentially) be present which do not carry anypolynucleotide (or other biopolymer or chemical moiety of a type ofwhich the features are composed). Such interfeature areas typically willbe present where the arrays are formed by processes involving dropdeposition of reagents but may not be present when, for example,photolithographic array fabrication processes are used. It will beappreciated though, that the interfeature areas, when present, could beof various sizes and configurations.

Each array may cover an area of less than 100 cm², or even less than 50cm², 10 cm² or 1 cm². In many embodiments, the substrate carrying theone or more arrays will be shaped generally as a rectangular solid(although other shapes are possible), having a length of more than 4 mmand less than 1 m, usually more than 4 mm and less than 600 mm, moreusually less than 400 mm; a width of more than 4 mm and less than 1 m,usually less than 500 mm and more usually less than 400 mm; and athickness of more than 0.01 mm and less than 5.0 mm, usually more than0.1 mm and less than 2 mm and more usually more than 0.2 and less than 1mm. With arrays that are read by detecting fluorescence, the substratemay be of a material that emits low fluorescence upon illumination withthe excitation light. Additionally in this situation, the substrate maybe relatively transparent to reduce the absorption of the incidentilluminating laser light and subsequent heating if the focused laserbeam travels too slowly over a region. For example, substrate 10 maytransmit at least 20%, or 50% (or even at least 70%, 90%, or 95%), ofthe illuminating light incident on the front as may be measured acrossthe entire integrated spectrum of such illuminating light oralternatively at 532 nm or 633 nm.

Arrays can be fabricated using drop deposition from pulse-jets of eitherpolynucleotide precursor units (such as monomers) in the case of in situfabrication, or the previously obtained polynucleotide. Such methods aredescribed in detail in, for example, the previously cited referencesincluding U.S. Pat. No. 6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat.No. 6,180,351, U.S. Pat. No. 6,171,797, U.S. Pat. No. 6,323,043, U.S.patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren etal., and the references cited therein. As already mentioned, thesereferences are incorporated herein by reference. Other drop depositionmethods can be used for fabrication, as previously described herein.Also, instead of drop deposition methods, photolithographic arrayfabrication methods may be used such as described in U.S. Pat. No.5,599,695, U.S. Pat. No. 5,753,788, and U.S. Pat. No. 6,329,143.Interfeature areas need not be present particularly when the arrays aremade by photolithographic methods as described in those patents.

An array is “addressable” when it has multiple regions of differentmoieties (e.g., different polynucleotide sequences) such that a region(i.e., a “feature” or “spot” of the array) at a particular predeterminedlocation (i.e., an “address”) on the array will detect a particulartarget or class of targets (although a feature may incidentally detectnon-targets of that feature). Array features are typically, but need notbe, separated by intervening spaces. In the case of an array, the“target” will be referenced as a moiety in a mobile phase (typicallyfluid), to be detected by probes (“target probes”) which are bound tothe substrate at the various regions. However, either of the “target” or“target probe” may be the one which is to be evaluated by the other(thus, either one could be an unknown mixture of polynucleotides to beevaluated by binding with the other). A “scan region” refers to acontiguous (preferably, rectangular) area in which the array spots orfeatures of interest, as defined above, are found. The scan region isthat portion of the total area illuminated from which the resultingfluorescence is detected and recorded. For the purposes of thisinvention, the scan region includes the entire area of the slide scannedin each pass of the lens, between the first feature of interest, and thelast feature of interest, even if there exist intervening areas whichlack features of interest. An “array layout” refers to one or morecharacteristics of the features, such as feature positioning on thesubstrate, one or more feature dimensions, and an indication of a moietyat a given location. “Hybridizing” and “binding”, with respect topolynucleotides, are used interchangeably. By “remote location,” it ismeant a location other than the location at which the array is presentand hybridization occurs. For example, a remote location could beanother location (e.g., office, lab, etc.) in the same city, anotherlocation in a different city, another location in a different state,another location in a different country, etc. As such, when one item isindicated as being “remote” from another, what is meant is that the twoitems are at least in different rooms or different buildings, and may beat least one mile, ten miles, or at least one hundred miles apart.“Communicating” information references transmitting the datarepresenting that information as electrical signals over a suitablecommunication channel (e.g., a private or public network). “Forwarding”an item refers to any means of getting that item from one location tothe next, whether by physically transporting that item or otherwise(where that is possible) and includes, at least in the case of data,physically transporting a medium carrying the data or communicating thedata. An array “package” may be the array plus only a substrate on whichthe array is deposited, although the package may include other features(such as a housing with a chamber). A “chamber” references an enclosedvolume (although a chamber may be accessible through one or more ports).It will also be appreciated that throughout the present application,that words such as “top,” “upper,” and “lower” are used in a relativesense only. The term “stringent hybridization conditions” as used hereinrefers to conditions that are that are compatible to produce duplexes onan array surface between complementary binding members, i.e., betweenprobes and complementary targets in a sample, e.g., duplexes of nucleicacid probes, such as DNA probes, and their corresponding nucleic acidtargets that are present in the sample, e.g., their corresponding mRNAanalytes present in the sample. An example of stringent hybridizationconditions is hybridization at 60° C. or higher and 3×SSC (450 mM sodiumchloride/45 mM sodium citrate). Another example of stringenthybridization conditions is incubation at 42° C. in a solutioncontaining 30% formamide, 1M NaCl, 0.5% sodium sarcosine, 50 mM MES, pH6.5. Stringent hybridization conditions are hybridization conditionsthat are at least as stringent as the above representative conditions,where conditions are considered to be at least as stringent if they areat least about 80% as stringent, typically at least about 90% asstringent as the above specific stringent conditions. Other stringenthybridization conditions are known in the art and may also be employed,as appropriate.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

Methods and compositions for producing labeled probe nucleic acids fromgenomic nucleic acid template are provided. In the subject methods, a aconserved coding consensus region primer, i.e., a primer recognizing atranslation start codon or an exon/intron junction, or set of primers isemployed to enzymatically generate labeled probe nucleic acids from agenomic template via a primer extension protocol. The subject methodsfind use in a variety of different applications, and are particularlysuited for use in the preparation of labeled probe nucleic acids for usein array based comparative genomic hybridization applications. Alsoprovided are kits for use in practicing the subject methods.

Before the subject invention is described further, it is to beunderstood that the invention is not limited to the particularembodiments of the invention described below, as variations of theparticular embodiments may be made and still fall within the scope ofthe appended claims. It is also to be understood that the terminologyemployed is for the purpose of describing particular embodiments, and isnot intended to be limiting. Instead, the scope of the present inventionwill be established by the appended claims.

In this specification and the appended claims, the singular forms “a,”“an” and “the” include plural reference unless the context clearlydictates otherwise. Unless defined otherwise, all technical andscientific terms used herein have the same meaning as commonlyunderstood to one of ordinary skill in the art to which this inventionbelongs.

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimit of that range, and any other stated or intervening value in thatstated range, is encompassed within the invention. The upper and lowerlimits of these smaller ranges may independently be included in thesmaller ranges, and are also encompassed within the invention, subjectto any specifically excluded limit in the stated range. Where the statedrange includes one or both of the limits, ranges excluding either orboth of those included limits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood to one of ordinary skill inthe art to which this invention belongs. Although any methods, devicesand materials similar or equivalent to those described herein can beused in the practice or testing of the invention, the preferred methods,devices and materials are now described.

All publications mentioned herein are incorporated herein by referencefor the purpose of describing and disclosing the invention componentsthat are described in the publications which might be used in connectionwith the presently described invention.

As summarized above, the present invention provides methods of producinglabeled nucleic acids from genomic template nucleic acid using aconserved primer or set of primers recognizing a translation start codonand/or an exon/intron junction, i.e., a conserved coding consensusregion primer, as well as kits for use in practicing the subjectmethods. In further describing the present invention, the subjectmethods are discussed first in greater detail, followed by a review ofrepresentative kits for use in practicing the subject methods.

Methods

The subject invention provides methods for generating labeled probenucleic acids from a genomic template, where a feature of the subjectmethods is the use of a conserved coding consensus sequence primer,usually in the form of a select set of oligonucleotides representingconserved sequences in genes, in a primer extension protocol.

In practicing the subject methods, the first step is to provide agenomic template. By genomic template is meant the nucleic acids thatare used as template in the primer extension reactions as described morein the following sections. In many embodiments, the genomic template isa population of genomic deoxyribonucleic acid molecules, where bypopulation is meant a collection of molecules in which at least twoconstituent members have nucleotide sequences that differ from eachother, e.g., by at least about 1 basepair, by at least about 5basepairs, by at least about 10 basepairs, by at least about 50 basepairs, by at least about 100 base pairs, by at least about 1 kb, by atleast about 10 kb etc.

The number of distinct sequences in a population of molecules making upa given genomic template is typically at least 2, usually at least 10and more usually at least 50, where the number of distinct molecules maybe 1000, 5000, 10000, 100000 or higher.

The genomic template may be prepared using any convenient protocol. Inmany embodiments, the genomic template is prepared by first obtaining asource of genomic DNA, e.g., a nuclear fraction of a cell lysate, whereany convenient means for obtaining such a fraction may be employed andnumerous protocols for doing so are well known in the art. The genomictemplate may be genomic DNA representing the entire genome from aparticular organism, tissue or cell type or may comprise a portion ofthe genome, such as a single chromosome. Genomic template may beprepared from a subject, for example a plant or an animal, that issuspected of being homozygous or heterozygous for a deletion oramplification of a genomic region. In many embodiments, the average sizeof the constituent molecules that make up the genomic template do notexceed about 10 kb in length, typically do not exceed about 8 kb inlength and sometimes do not exceed about 5 kb in length, such that theaverage length of molecules in a given genomic template composition mayrange from about 1 kb to about 10 kb, usually from about 5 kb to about 8kb in certain embodiments. The genomic template may be prepared from aninitial chromosomal source by fragmenting the source into the genomictemplate having molecules of the desired size range, where fragmentationmay be achieved using any convenient protocol, including but not limitedto: mechanical protocols, e.g., sonication, shearing, etc., chemicalprotocols, e.g., enzyme digestion, etc.

Following preparation of the genomic template, as described above, theprepared genomic template is employed in the preparation of labeledprobe nucleic acids in a protocol in which at least one primer, andoften a mixture of different primers, typically of greater than 6nucleotides in length, are employed, where a feature of the employedprimers is that they include a sequence of nt residues that is aconserved coding consensus region. By conserved coding consensus regionis meant a domain or stretch of nt residues having a sequence of ntresidues that hybridizes to (i.e., is the exact complement of)(typically under stringent conditions) a translation start codon or anexon/intron junction, e.g., as found in a 5′ or 3′ splice site. As such,of interest are (N)nATG(N)n containing primers and consensus splice sitecontaining primers, wherein each of these types of primers is nowdescribed in greater detail below.

The primers employed in the subject methods are typically at least about6 nt in length. In many embodiments, an oligonucleotide primer employedin the subject methods is one that ranges in length from about 3 toabout 25 nt, sometimes from about 5 to about 20 nt and sometimes fromabout 5 to about 10 nt. By (N)nATG(N)n oligonucleotide primer is meantan oligonucleotide that includes within its length an ATG codon. Inother words, random nucleotides may exist at either the 5′ or 3′ ends,or both in order to create a primer of greater than 6 nucleotides inlength.

In certain embodiments, the primer further includes a 5′ domain ofrandom sequence, which random domain may be made up of one or morenucelotide residues, of any base, e.g., degenerate bases, universalbases, modified bases etc. In certain embodiments, the random sequencedomain is made up of from about 1 to 10 nt, usually from about 2 to 8nt, including 3, 4, 5, or 6 nt, etc. In certain embodiments, the randomdomain is a domain where all possible variations of this random sequenceare represented in a primer mix of second primers. For example, incertain embodiments where the spacer is denoted NNNNNN, thisrepresentation is intended to indicate that A, G, C, or T can appear atany position, and therefore the spacer six nucleotides of the primers inthe set represent all 4096 (46) possible hexamers.

In certain embodiments, the second strand cDNA primer is described bythe formula:5′-(N)n-ATG (N)n-3′wherein:

ATG are the consecutive nucleotides, A, T and G;

N is any deoxyribonucleotide residue, e.g., A, G, C, T; and

n is 0 or an integer from about 1 to about 10, e.g., from 1 to 8, from 2to 7, etc, where in many embodiments n is 6.

As summarized above, also of interest are primers that include aconsensus splice site. The consensus splice site sequences described byPertea, M, Lin, X., And Salzber, S. (2001) Nuc. Acids Res. 29, 1185-1190or similar sequences conserved within the coding portion of a gene canbe included in the primer mixture or can be used in place of the(N)nATG(N)n primer, as described above. These primers need not hybridizeto the coding strand, but they should extend into the coding portion ofthe gene (exon) such that they are complimentary to sequences found indouble stranded cDNAs. These primers can likewise be extended into nonconserved regions by additions of Nn nucleotides to either or both ends.A specific representative primer of this particular embodiment is5′-ACTTACCTN-3′, where N is T or G. Yet another specific representativeprimer of this particular embodiment is (N)nAgg(N)n (or the complementthereof), where N and n have the meanings ascribed above.

In certain embodiments, the above consensus splice site sequences arefrom a particular species, e.g., mammal, such as human. Many such splicesites are known to those of skill in the art.

Programs for identifying splice sites in genomic sequences are wellknown to those of skill in the art, and include, but are not limited to,those described in: (1) Pertea, M, Lin, X., And Salzber, S. (2001) Nuc.Acids Res. 29, 1185-1190; (2) Brunak, et al., “Prediction of human mRNAdonor and acceptor sites from the DNA sequence” J. Mol. Biol., 220,49-65, 1991; (3) Reese, M. G. and Eeckman, F. H. (1996) “Splice Sites: Adetailed neural network study” Poster at 1996 Genome Mapping &Sequencing Meeting, Cold Spring Harbor Laboratory, New York. Splicesites of interest include those that may be identified using any of theabove representative or other analogous programs known to those of skillin the art.

In those embodiments where a mixture of primers is employed, a featureof the primer mixture is that it is not a random mixture. As the primermixture is not a random mixture, at least about 5%, usually at leastabout 10%, about 20%, about 25%, about 30%, about 40%, about 50% or moreof the primers in the mixture are known to include a conserved codingconsensus region. In many embodiments, a majority of the primers in themixture are known to include a conserved coding consensus region, suchthat more than 50%, e.g., at least about 60%, about 70%, about 80% ormore, such as 90%, 95%, 99% or more, including all of, the primers inthe mixture are known to include a conserved coding consensus region. Assuch, in certain embodiments, a known percentage, including a majority,of the primers in the mixture are known to include a sequence thatprovides for hybridization under stringent conditions to genomic regionsin the genomic sample and therefore priming of nucleic acids therefrom.

The primers described above and throughout this specification may beprepared using any suitable method, such as, for example, the knownphosphotriester and phosphite triester methods, or automated embodimentsthereof. In one such automated embodiment, dialkyl phosphoramidites areused as starting materials and may be synthesized as described byBeaucage et al. (1981), Tetrahedron Letters 22, 1859. One method forsynthesizing oligonucleotides on a modified solid support is describedin U.S. Pat. No. 4,458,066. It is also possible to use a primer that hasbeen isolated from a biological source (such as the cleaved products ofa restriction endonuclease digest). The primers herein are selected tobe “substantially” complementary to each specific sequence to beamplified, i.e.; the primers should be sufficiently complementary tohybridize to their respective targets. Therefore, the primer sequenceneed not reflect the exact sequence of the target, and can, in fact be“degenerate.” Non-complementary bases or longer sequences can beinterspersed into the primer, provided that the primer sequence hassufficient complementarity with the sequence of the target to beamplified to permit hybridization and extension.

As indicated above, in generating labeled probe nucleic acids accordingto the subject methods, the above-described genomic template andconserved primer components are employed together in a primer extensionreaction that produces the desired labeled probe nucleic acids. Primerextension reactions for generating labeled nucleic acids are well knownto those of skill in the art, and any convenient protocol may beemployed, so long as the above described genomic template and conservedcoding consensus sequence primers are employed. In this step of thesubject methods, the primer is contacted with the template underconditions sufficient to extend the primer and produce a primerextension product. As such, the above primers are contacted with thegenomic template in the presence of a sufficient DNA polymerase underprimer extension conditions sufficient to produce the desired primerextension molecules. DNA polymerases of interest include, but are notlimited to, polymerases derived from E. coli, thermophilic bacteria,archaebacteria, phage, yeasts, Neurosporas, Drosophilas, primates androdents, likewise they include polymerases such as ReverseTranscriptases and the like. The DNA polymerase extends the primeraccording to the genomic template to which it is hybridized in thepresence of additional reagents which include, but are not limited to:dNTPs; monovalent and divalent cations, e.g. KCl, MgCl₂; sulfhydrylreagents, e.g. dithiothreitol; and buffering agents, e.g. Tris-Cl. Thisprotocol is illustrated in FIGS. 2A and 2B.

As the subject methods are methods of producing labeled probe nucleicacids, extension products that are produced are labeled. In theseembodiments, the reagents employed in the subject primer extensionreactions typically include a labeling reagent, where the labelingreagent is often a labeled oligonucleotide, which may be labeled with adirectly or indirectly detectable label. A directly detectable label isone that can be directly detected without the use of additionalreagents, while an indirectly detectable label is one that is detectableby employing one or more additional reagent, e.g., where the label is amember of a signal producing system made up of two or more components.In many embodiments, the label is a directly detectable label, such as afluorescent label, where the labeling reagent employed in suchembodiments is a fluorescently tagged nucleotide(s), e.g. dCTP.Fluorescent moieties which may be used to tag nucleotides for producinglabeled probe nucleic acids include, but are not limited to:fluorescein, the cyanine dyes, such as Cy3, Cy5, Alexa 555, Bodipy630/650, and the like. Other labels may also be employed as are known inthe art.

In the primer extension reactions employed in the subject methods ofthese embodiments, the genomic template is typically first subjected tostrand disassociation condition, e.g., subjected to a temperatureranging from about 80° C. to about 100° C., usually from about 90° C. toabout 95° C. for a period of time, and the resultant disassociatedtemplate molecules are then contacted with the primer molecules underannealing conditions, where the temperature of the template and primercomposition is reduced to an annealing temperature of from about 20° C.to about 80° C., usually from about 37° C. to about 65° C. In certainembodiments, a “snap-cooling” protocol is employed, where thetemperature is reduced to the annealing temperature, or to about 4° C.or below in a period of from about 1S to about 30S, usually from about5S to about 10S.

The resultant annealed primer/template hybrids are then maintained in areaction mixture that includes the above-discussed reagents at asufficient temperature and for a sufficient period of time to producethe desired labeled probe nucleic acids. Typically, this incubationtemperature ranges from about 20° C. to about 75° C., usually from about37° C. to about 65° C. The incubation time typically ranges from about 5min to about 18 hr, usually from about 1 hr to about 12 hr.

The above protocol results in the production of labeled probe nucleicacids. Where desired, the resultant produced labeled probe nucleic acidsmay be separated from the remainder of the reaction mixture, where anyconvenient separation protocol may be employed.

The above protocol results in the production of a select population oflabeled probe nucleic acids corresponding to genes and more specificallycoding regions within genes from an initial genomic template. Arepresentative protocol is shown in FIG. 1.

Utility

The resultant labeled nucleic acid populations find use in a variety ofdifferent applications.

One type of representative application in which the subject methods finduse is applications of quantitatively comparing copy number of onenucleic acid sequence in a first collection of nucleic acid moleculesrelative to the copy number of the same sequence in a second collection.An advantage of the method is that up to 5, sometimes more than 10,usually more than 100 and sometimes more than 1000 copy numbercomparisons can be made in one hybridization experiment. In theseapplications, the subject methods are employed to produce at least afirst collection of labeled probe nucleic acids and a second collectionof labeled probe nucleic acids. In such applications, the first andsecond labels should be distinguishable from each other. The collectionsor populations of labeled probe nucleic acids produced by the subjectmethods are contacted to a plurality of target elements under conditionssuch that nucleic acid hybridization to the target elements can occur.The probes can be contacted to the target elements either simultaneouslyor serially, where in many embodiments the probe compositions arecontacted with the array of targets simultaneously. As such, the presentinvention may be used in methods of comparing abnormal nucleic acid copynumber and mapping of chromosomal abnormalities associated with disease.In many embodiments, the subject labeling methods are employed inapplications that use target nucleic acids immobilized on a solidsupport, to which differentially labeled probe nucleic acids produced asdescribed above are hybridized.

Hybridization is carried out under suitable hybridization conditions,which may vary in stringency as desired. As such, in certain embodimentshighly stringent hybridization conditions may be employed, where inother embodiments low stringency hybridization conditions may beemployed. The term “high stringent hybridization conditions” as usedherein refers to conditions that are compatible to produce duplexes onan array surface between complementary binding members, i.e., betweenprobes and complementary targets in a sample, e.g., duplexes of nucleicacid probes, such as DNA probes, and their corresponding nucleic acidtargets that are present in the sample, e.g., their corresponding mRNAanalytes present in the sample. An example of high stringenthybridization conditions is hybridization at 50° C. or higher and0.1×SSC (15 mM sodium chloride/1.5 mM sodium citrate). Another exampleof high stringent hybridization conditions is overnight incubation at42° C. in a solution: 50% formamide, 5×SSC (150 mM NaCl, 15 mM trisodiumcitrate), 50 mM sodium phosphate (pH7.6), 5× Denhardt's solution, 10%dextran sulfate, followed by washing the filters in 0.1×SSC at about 65°C. High stringent hybridization conditions are hybridization conditionsthat are at least as stringent as the above representative conditions.Other stringent hybridization conditions are known in the art and mayalso be employed to identify nucleic acids of this particular embodimentof the invention.

In certain embodiments, hybridization is carried out under lowstringency conditions. Representative low stringency conditions include,but are not limited to: (a) hybridization at 5° C. below ½Tm; (b)hybridization at 50° C. and 6×SSC (0.9 M sodium chloride/0.09 M sodiumcitrate); (c) or analogous conditions, which can readily be determinedby those of skill in the art. The reduced complexity of the probepopulation enables the binding equilibrium to be shifted in favor ofassociation (lower stringency) in the hybridization reaction withoutcompromising the quality of the result.

The hybridization of the labeled nucleic acids to the target is thendetected using standard techniques. Such applications compare the copynumbers of sequences capable of binding to the target elements.Variations in copy number detectable by the methods of the invention mayarise in different ways. For example, copy number may be altered as aresult of amplification or deletion of a chromosomal region.Alternatively, copy number may be reduced by genetic rearrangements thatalter the sequences in the probe or target nucleic acids sufficiently toreduce their binding.

As such, the method may be used for mutation detection, and is mostuseful for the analysis of multiple gene loci, for example in molecularbreeding programs, or in the mapping or identification of genesresponsible for polygenic traits.

Target nucleic acids employed in such applications can be derived fromvirtually any source. Typically, the targets will be nucleic acidmolecules derived from representative locations along a chromosome ofinterest, a chromosomal region of interest, an entire genome ofinterest, a cDNA library, and the like. These target nucleic acids maybe relatively long (typically thousands of bases) fragments of nucleicacid obtained from, for instance, inter-Alu PCR products of genomicclones, restriction digests of genomic clone, cDNA clones and the like.In some embodiments the target nucleic acids are a previously mappedlibrary of clones spanning a particular region of interest.

The choice of target nucleic acids to use may be influenced by priorknowledge of the association of a particular chromosome or chromosomalregion with certain disease conditions. International Application WO93/18186 provides a list of chromosomal abnormalities and associateddiseases, which are described in the scientific literature.Alternatively, whole genome screening to identify new region subject tofrequent changes in copy number can be performed using the methods ofthe present invention. In these embodiments, target elements usuallycontain nucleic acids representative of locations distributed over theentire genome. In some embodiments (e.g., using a large number of targetelements of high complexity) all sequences in the genome can be presentin the array.

In some embodiments, previously mapped clones from a particularchromosomal region of interest are used as targets. Such clones arebecoming available as a result of rapid progress of the worldwideinitiative in genomics. Mapped clones can be prepared from librariesconstructed from single chromosomes, multiple chromosomes, or from asegment of a chromosome. Standard techniques are used to clone suitablysized fragments in vectors such as cosmids, yeast artificial chromosomes(YACs), bacterial artificial chromosomes (BACs) and P1 phage. While itis possible to generate clone libraries, as described above, librariesspanning entire chromosomes are also available commercially. Forinstance, chromosome-specific libraries from the human and other genomesare available for Clontech (South San Francisco, Calif.) or from TheAmerican Type Culture Collection (see, ATCC/NIH Repository of Catalogueof Human and Mouse DNA Probes and Libraries, 7th ed. 1993). Ifnecessary, clones described above may be genetically or physicallymapped. For instance, FISH and digital image analysis can be used tolocalize cosmids along the desired chromosome. This method is described,for instance, in Lichter et al., Science, 247:64-69 (1990). Thephysically mapped clones can then be used to more finally map a regionof interest identified using CGH or other methods.

The targets employed in the subject methods are immobilized on a solidsupport. Many methods for immobilizing nucleic acids on a variety ofsolid surfaces are known in the art. For instance, the solid surface maybe a membrane, glass, plastic, or a bead. The desired component may becovalently bound or noncovalently attached through nonspecific binding.The immobilization of nucleic acids on solid surfaces is discussed morefully below.

A wide variety of organic and inorganic polymers, as well as othermaterials, both natural and synthetic, may be employed as the materialfor the solid surface. Illustrative solid surfaces includenitrocellulose, nylon, glass, diazotized membranes (paper or nylon),silicones, polyformaldehyde, cellulose, and cellulose acetate. Inaddition, plastics such as polyethylene, polypropylene, polystyrene, andthe like can be used. Other materials which may be employed includepaper, ceramics, metals, metalloids, semiconductive materials, cermetsor the like. In addition substances that form gels can be used. Suchmaterials include proteins (e.g., gelatins), lipopolysaccharides,silicates, agarose and polyacrylamides. Where the solid surface isporous, various pore sizes may be employed depending upon the nature ofthe system.

In preparing the surface, a plurality of different materials may beemployed, particularly as laminates, to obtain various properties. Forexample, proteins (e.g., bovine serum albumin) or mixtures ofmacromolecules (e.g., Denhardt's solution) can be employed to avoidnon-specific binding, simplify covalent conjugation, enhance signaldetection or the like.

If covalent bonding between a compound and the surface is desired, thesurface will usually be polyfunctional or be capable of beingpolyfunctionalized. Functional groups which may be present on thesurface and used for linking can include carboxylic acids, aldehydes,amino groups, cyano groups, ethylenic groups, hydroxyl groups, mercaptogroups and the like. The manner of linking a wide variety of compoundsto various surfaces is well known and is amply illustrated in theliterature. For example, methods for immobilizing nucleic acids byintroduction of various functional groups to the molecules is known(see, e.g., Bischoff et al., Anal. Biochem. 164:336-344 (1987); Kremskyet al., Nuc. Acids Res. 15:2891-2910 (1987)). Modified nucleotides canbe placed on the target using PCR primers containing the modifiednucleotide, or by enzymatic end labeling with modified nucleotides.

Use of membrane supports (e.g., nitrocellulose, nylon, polypropylene)for the nucleic acid arrays of the invention is advantageous in certainembodiments because of well developed technology employing manual androbotic methods of arraying targets at relatively high element densities(e.g., up to 30-40/cm.sup.2). In addition, such membranes are generallyavailable and protocols and equipment for hybridization to membranes iswell known. Many membrane materials, however, have considerablefluorescence emission, where fluorescent labels are used to detecthybridization.

To optimize a given assay format one of skill can determine sensitivityof fluorescence detection for different combinations of membrane type,fluorochrome, excitation and emission bands, spot size and the like. Inaddition, low fluorescence background membranes have been described(see, e.g., Chu et al., Electrophoresis 13:105-114 (1992)).

The sensitivity for detection of spots of various diameters on thecandidate membranes can be readily determined by, for example, spottinga dilution series of fluorescently end labeled DNA fragments. Thesespots are then imaged using conventional fluorescence microscopy. Thesensitivity, linearity, and dynamic range achievable from the variouscombinations of fluorochrome and membranes can thus be determined.Serial dilutions of pairs of fluorochrome in known relative proportionscan also be analyzed to determine the accuracy with which fluorescenceratio measurements reflect actual fluorochrome ratios over the dynamicrange permitted by the detectors and membrane fluorescence.

Arrays on substrates with much lower fluorescence than membranes, suchas glass, quartz, or small beads, can achieve much better sensitivity.For example, elements of various sizes, ranging from the about 1 mmdiameter down to about 1 μm can be used with these materials. Smallarray members containing small amounts of concentrated target DNA areconveniently used for high complexity comparative hybridizations sincethe total amount of probe available for binding to each element will belimited. Thus it is advantageous to have small array members thatcontain a small amount of concentrated target DNA so that the signalthat is obtained is highly localized and bright. Such small arraymembers are typically used in arrays with densities greater than10⁴/cm². Relatively simple approaches capable of quantitativefluorescent imaging of 1 cm² areas have been described that permitacquisition of data from a large number of members in a single image(see, e.g., Wittrup et. al. Cytometry 16:206-213 (1994)).

Covalent attachment of the target nucleic acids to glass or syntheticfused silica can be accomplished according to a number of knowntechniques. Such substrates provide a very low fluorescence substrate,and a highly efficient hybridization environment.

There are many possible approaches to coupling nucleic acids to glassthat employ commercially available reagents. For instance, materials forpreparation of silanized glass with a number of functional groups arecommercially available or can be prepared using standard techniques.Alternatively, quartz cover slips, which have at least 10-fold lowerauto fluorescence than glass, can be silanized.

The targets can also be immobilized on commercially available coatedbeads or other surfaces. For instance, biotin end-labeled nucleic acidscan be bound to commercially available avidin-coated beads. Streptavidinor anti-digoxigenin antibody can also be attached to silanized glassslides by protein-mediated coupling using e.g., protein A followingstandard protocols (see, e.g., Smith et al. Science, 258:1122-1126(1992)). Biotin or digoxigenin end-labeled nucleic acids can be preparedaccording to standard techniques. Hybridization to nucleic acidsattached to beads is accomplished by suspending them in thehybridization mix, and then depositing them on the glass substrate foranalysis after washing. Alternatively, paramagnetic particles, such asferric oxide particles, with or without avidin coating, can be used.

The copy number of particular nucleic acid sequences in two probecollections prepared according to the subject methods are compared byhybridizing the probes to one or more target nucleic acid arrays, asdescribed above. The hybridization signal intensity, and the ratio ofintensities, produced by the probes on each of the target elements isdetermined Since signal intensities on a target element can beinfluenced by factors other than the copy number of a probe in solution,it is preferred to conduct an analysis where two labeled populations arepresent with distinct labels. Thus comparison of the signal intensityratios among target elements permits comparison of copy number ratios ofdifferent sequences in the probe populations.

Standard hybridization techniques are used to probe a target nucleicacid array. Suitable methods are described in references describing CGHtechniques (Kallioniemi et al., Science 258:818-821 (1992) and WO93/18186). Several guides to general techniques are available, e.g.,Tijssen, Hybridization with Nucleic Acid Probes, Parts 1 and 11(Elsevier, Amsterdam 1993). For a descriptions of techniques suitablefor in situ hybridizations see, Gall et al. Meth. Enzymol., 21:470-480(1981) and Angerer et al. in Genetic Engineering: Principles and MethodsSetlow and Hollaender, Eds. Vol 7, pgs 43-65 (plenum Press, New York1985). See also U.S. Pat. Nos. 6,335,167; 6,197,501; 5,830,645; and5,665,549; the disclosures of which are herein incorporate by reference.

Generally, nucleic acid hybridizations comprise the following majorsteps: (1) immobilization of target nucleic acids; (2) prehybridizationtreatment to increase accessibility of target DNA, and to reducenonspecific binding; (3) hybridization of the mixture of nucleic acidsto the nucleic acid on the solid surface; (4) posthybridization washesto remove nucleic acid fragments not bound in the hybridization and (5)detection of the hybridized nucleic acid fragments. The reagent used ineach of these steps and their conditions for use vary depending on theparticular application.

Reading of the resultant hybridized array may be accomplished byilluminating the array and reading the location and intensity ofresulting fluorescence at each feature of the array to detect anybinding complexes on the surface of the array. For example, a scannermay be used for this purpose which is similar to the AGILENT MICROARRAYSCANNER scanner available from Agilent Technologies, Palo Alto, Calif.Other suitable apparatus and methods are described in U.S. patentapplications: Ser. No. 09/846,125 “Reading Multi-Featured Arrays” byDorsel et al.; and Ser. No. 09/430,214 “Interrogating Multi-FeaturedArrays” by Dorsel et al., which references are incorporated herein byreference. However, arrays may be read by any other method or apparatusthan the foregoing, with other reading methods including other opticaltechniques (for example, detecting chemiluminescent orelectroluminescent labels) or electrical techniques (where each featureis provided with an electrode to detect hybridization at that feature ina manner disclosed in U.S. Pat. No. 6,221,583 and elsewhere).

Results from the reading or evaluating may be raw results (such asfluorescence intensity readings for each feature in one or more colorchannels) or may be processed results such as obtained by rejecting areading for a feature which is below a predetermined threshold and/orforming conclusions based on the pattern read from the array (such aswhether or not a particular target sequence may have been present in thesample, or whether or not a pattern indicates a particular condition ofan organism from which the sample came).

In certain embodiments, the subject methods include a step oftransmitting data or results from at least one of the detecting andderiving steps, also referred to herein as evaluating, as describedabove, to a remote location. By “remote location” is meant a locationother than the location at which the array is present and hybridizationoccur. For example, a remote location could be another location (e.g.office, lab, etc.) in the same city, another location in a differentcity, another location in a different state, another location in adifferent country, etc. As such, when one item is indicated as being“remote” from another, what is meant is that the two items are at leastin different buildings, and may be at least one mile, ten miles, or atleast one hundred miles apart.

“Communicating” information means transmitting the data representingthat information as electrical signals over a suitable communicationchannel (for example, a private or public network). “Forwarding” an itemrefers to any means of getting that item from one location to the next,whether by physically transporting that item or otherwise (where that ispossible) and includes, at least in the case of data, physicallytransporting a medium carrying the data or communicating the data. Thedata may be transmitted to the remote location for further evaluationand/or use. Any convenient telecommunications means may be employed fortransmitting the data, e.g., facsimile, modem, internet, etc.

Analysis of processed results of the described hybridization experimentsprovides information about the relative copy number of nucleic aciddomains, e.g. genes, in genomes.

Kits

Also provided are kits for use in the subject invention, where such kitsmay comprise containers, each with one or more of the various reagents(typically in concentrated form) utilized in the methods, where suchreagents include, but are not limited, the subject conserved primers,buffers, the appropriate nucleotide triphosphates (e.g. dATP, dCTP,dGTP, dTTP), DNA polymerase, labeling reagents, e.g., labelednucleotides, and the like. Where the kits are specifically designed foruse in CGH applications, the kits may further include labeling reagentsfor making two or more collections of distinguishably labeled nucleicacids according to the subject methods, an array of target nucleicacids, hybridization solution, etc.

Finally, the kits may further include instructions for using the kitcomponents in the subject methods. The instructions may be printed on asubstrate, such as paper or plastic, etc. As such, the instructions maybe present in the kits as a package insert, in the labeling of thecontainer of the kit or components thereof (i.e., associated with thepackaging or sub-packaging) etc. In other embodiments, the instructionsare present as an electronic storage data file present on a suitablecomputer readable storage medium, e.g., CD-ROM, diskette, etc.

The following examples are offered by way of illustration and not by wayof limitation.

EXPERIMENTAL

Genomic DNA from male and female cells isolated using traditionalmethods (eg Trizol, Qiagen) is restriction digested with 12.5 units ofAlu I and Rsa I in the 1× buffer provided by the vendor overnight at 37°C. The digested DNA is purified using the Qiagen PCR Purification kitand concentrated to a final concentration of >0.3 mg/ml in a Speed-Vac.

Two labeling reactions are carried out as described below, the malesample to be ultimately labeled with Cyanine 3, the female with Cyanine5. A solution containing 6 μg of digested DNA is transferred to amicrofuge tube containing 10 μg of conserved primer mix made up of equalamounts of 5′-NNNATG-3′ primer and 5′-ACTTACC-3′. The solution is heatedto 95° C. for 3-5 minutes and quick cooled by transfer to an ice bath.After 10 minutes on ice reaction components are added to achieve finalconcentrations below; 50 μM dATP/dTTP/dGTP, 25 μM dCTP, 25 μM labeleddCTP, 1×MMLV reaction buffer and 200u MMLV-RT. The reaction istransferred to 42c water bath and allowed to proceed for 60 minutes.Following the reactions the solutions are pooled and the labeledcomponents are purified using the Qiagen PCR Purification kit andconcentrated as described previously.

The labeled products are then denatured at 95° C. for 5 minutes, dilutedinto Agilent's Deposition Hybridization buffer and transferred to anAgilent Human 1 cDNA microarray. The array is allowed to hybridizeovernight at 60° C., washed, scanned and featured extracted according tomanufacturers instructions.

Genes present on the X and Y chromosomes are recognized as either havinghigher Cyanine 3 or Cyanine 5 signals. Genes present on the otherchromosomes have a balance of Cyanine 3 and Cyanine 5 signals.

The above results and discussion demonstrate that novel methods ofproducing labeled probe nucleic acids from genomic template is provided,where advantages of the subject methods include the feature that theproduced populations are less complex than genomically producedpopulations produced by other methods, such as nick translation orrandom primer extension, and are therefore more suitable for use withimmobilized target array based CGH applications. As such, the subjectmethods represent a significant contribution to the art.

All publications and patent application cited in this specification areherein incorporated by reference as if each individual publication orpatent application were specifically and individually indicated to beincorporated by reference. The citation of any publication is for itsdisclosure prior to the filing date and should not be construed as anadmission that the present invention is not entitled to antedate suchpublication by virtue of prior invention.

Although the foregoing invention has been described in some detail byway of illustration and example for purposes of clarity ofunderstanding, it is readily apparent to those of ordinary skill in theart in light of the teachings of this invention that certain changes andmodifications may be made thereto without departing from the spirit orscope of the appended claims.

1-17. (canceled)
 18. An oligonucleotide primer comprising a conservedcoding consensus region.
 19. The oligonucleotide primer according toclaim 18, wherein said primer ranges in length from about 5 to about 20nt.
 20. The oligonucleotide primer according to claim 19, wherein saidprimer further comprises at least one of a 5′ and a 3′ random sequence,each of which independently ranges in length from about 1 to about 10nt.
 21. The oligonucleotide primer according to claim 19, wherein saidprimer is an ATG comprising primer.
 22. The oligonucleotide primeraccording to claim 19, wherein said primer comprises a consensus splicesite.
 23. A kit for use for use in comparing the relative copy number ofnucleic acid sequences in two or more collections of nucleic acidmolecules, said kit comprising: (a) an oligonucleotide primer comprisingconserved coding consensus region; and (b) instructions for practicingthe method according to claim
 1. 24. The kit according to claim 23,wherein said kit further comprises first and second nucleic acidlabeling reagents having distinguishable labels.
 25. The kit accordingto claim 24, wherein said distinguishable labels are fluorescentdistinguishable labels.
 26. The kit according to claim 23, wherein saidfurther comprises a plurality of target elements bound to a solidsurface.